-
Notifications
You must be signed in to change notification settings - Fork 839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for backward sync wrongly thinking it is done after a restart #5182
Conversation
Signed-off-by: Fabio Di Fabio <fabio.difabio@consensys.net>
Signed-off-by: Fabio Di Fabio <fabio.difabio@consensys.net>
Signed-off-by: Fabio Di Fabio <fabio.difabio@consensys.net>
…u restart" This reverts commit e7ac9e5. Signed-off-by: Fabio Di Fabio <fabio.difabio@consensys.net> # Conflicts: # ethereum/eth/src/main/java/org/hyperledger/besu/ethereum/eth/sync/backwardsync/BackwardSyncContext.java
Signed-off-by: Fabio Di Fabio <fabio.difabio@consensys.net>
Hi @fab-10 can you add details of any testing you've done to the description please. |
To test I have used a mainnet Lighthouse-Besu, with a long backward sync (days), that was experiencing the problem, and after applying this fix, I have restarted it many times, this issue was not reported anymore and the sync eventually finished. Note: in case you apply this fix to an instance that is currently doing a backward sync, of course on the first start it still does not have the stored state, so it could still report the issue, so the workaround is: delete the CL data and let it checkpoint sync (it only takes seconds) and restart Besu, so the CL could send a fresh hash that is not in the backward sync storage |
Signed-off-by: Fabio Di Fabio <fabio.difabio@consensys.net>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know this code well enough to approve, trying to learn it hence the questions!
...eum/eth/src/main/java/org/hyperledger/besu/ethereum/eth/sync/backwardsync/BackwardChain.java
Outdated
Show resolved
Hide resolved
...eum/eth/src/main/java/org/hyperledger/besu/ethereum/eth/sync/backwardsync/BackwardChain.java
Show resolved
Hide resolved
...eum/eth/src/main/java/org/hyperledger/besu/ethereum/eth/sync/backwardsync/BackwardChain.java
Show resolved
Hide resolved
...eum/eth/src/main/java/org/hyperledger/besu/ethereum/eth/sync/backwardsync/BackwardChain.java
Outdated
Show resolved
Hide resolved
.log(); | ||
|
||
if (firstStoredAncestor.isEmpty()) { | ||
updateLastStorePivot(Optional.of(blockHeader)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we add the blockHeader to the chainStorage here as well?
Does firstStoredAncestor.isEmpty represent the first BWS block that we've received?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we add the blockHeader to the chainStorage here as well?
chainStorage
is the sequence of the blocks to import, each entry is blockHash -> nextBlockHash, so you know what block to import next, while the block headers are saved in another table.
Does firstStoredAncestor.isEmpty represent the first BWS block that we've received?
it is empty at the beginning of the bws, and then it is updated when going backward/forward to point to the current block, and this is one of the variables that was not stored, but is required for to resume the session
...src/main/java/org/hyperledger/besu/ethereum/eth/sync/backwardsync/BackwardSyncAlgorithm.java
Show resolved
Hide resolved
...eum/eth/src/main/java/org/hyperledger/besu/ethereum/eth/sync/backwardsync/BackwardChain.java
Outdated
Show resolved
Hide resolved
...eum/eth/src/main/java/org/hyperledger/besu/ethereum/eth/sync/backwardsync/BackwardChain.java
Show resolved
Hide resolved
tried the heal with this PR and it seems to be good |
Signed-off-by: Fabio Di Fabio <fabio.difabio@consensys.net>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks ok to me - one minor comment
...h/src/main/java/org/hyperledger/besu/ethereum/eth/sync/backwardsync/BackwardSyncContext.java
Outdated
Show resolved
Hide resolved
.../eth/src/main/java/org/hyperledger/besu/ethereum/eth/sync/backwardsync/BackwardSyncStep.java
Show resolved
Hide resolved
Co-authored-by: Sally MacFarlane <macfarla.github@gmail.com> Signed-off-by: Fabio Di Fabio <fabio.difabio@consensys.net>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…perledger#5182) <!-- Thanks for sending a pull request! Please check out our contribution guidelines: --> <!-- https://github.com/hyperledger/besu/blob/main/CONTRIBUTING.md --> ## PR description There is an issue when restarting Besu when a backward sync session is running, since after the restart it is possible that the Consensus client sends a FcU or a NewPayload for a block that is present in the backward sync storage, but not yet imported, so not on the main chain, but still the backward sync thinks it should not do anything with that block, so it returns like it has completed the sync, but since the sync is not done actually then the internal error that the finalize block is not present. The solution is to persist the backward sync status, so in case of a restart, it can resume from where it was interrupted. ## Fixed Issue(s) <!-- Please link to fixed issue(s) here using format: fixes #<issue number> --> <!-- Example: "fixes #2" --> fixes hyperledger#5053 ## Documentation - [x] I thought about documentation and added the `doc-change-required` label to this PR if [updates are required](https://wiki.hyperledger.org/display/BESU/Documentation). ## Acceptance Tests (Non Mainnet) - [x] I have considered running `./gradlew acceptanceTestNonMainnet` locally if my PR affects non-mainnet modules. ## Changelog - [x] I thought about the changelog and included a [changelog update if required](https://wiki.hyperledger.org/display/BESU/Changelog). --------- Signed-off-by: Fabio Di Fabio <fabio.difabio@consensys.net> Co-authored-by: Sally MacFarlane <macfarla.github@gmail.com>
…perledger#5182) <!-- Thanks for sending a pull request! Please check out our contribution guidelines: --> <!-- https://github.com/hyperledger/besu/blob/main/CONTRIBUTING.md --> ## PR description There is an issue when restarting Besu when a backward sync session is running, since after the restart it is possible that the Consensus client sends a FcU or a NewPayload for a block that is present in the backward sync storage, but not yet imported, so not on the main chain, but still the backward sync thinks it should not do anything with that block, so it returns like it has completed the sync, but since the sync is not done actually then the internal error that the finalize block is not present. The solution is to persist the backward sync status, so in case of a restart, it can resume from where it was interrupted. ## Fixed Issue(s) <!-- Please link to fixed issue(s) here using format: fixes #<issue number> --> <!-- Example: "fixes #2" --> fixes hyperledger#5053 ## Documentation - [x] I thought about documentation and added the `doc-change-required` label to this PR if [updates are required](https://wiki.hyperledger.org/display/BESU/Documentation). ## Acceptance Tests (Non Mainnet) - [x] I have considered running `./gradlew acceptanceTestNonMainnet` locally if my PR affects non-mainnet modules. ## Changelog - [x] I thought about the changelog and included a [changelog update if required](https://wiki.hyperledger.org/display/BESU/Changelog). --------- Signed-off-by: Fabio Di Fabio <fabio.difabio@consensys.net> Co-authored-by: Sally MacFarlane <macfarla.github@gmail.com>
PR description
There is an issue when restarting Besu when a backward sync session is running, since after the restart it is possible that the Consensus client sends a FcU or a NewPayload for a block that is present in the backward sync storage, but not yet imported, so not on the main chain, but still the backward sync thinks it should not do anything with that block, so it returns like it has completed the sync, but since the sync is not done actually then the internal error that the finalize block is not present.
The solution is to persist the backward sync status, so in case of a restart, it can resume from where it was interrupted.
Fixed Issue(s)
fixes #5053
Documentation
doc-change-required
label to this PR ifupdates are required.
Acceptance Tests (Non Mainnet)
./gradlew acceptanceTestNonMainnet
locally if my PR affects non-mainnet modules.Changelog